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1 Introduction 

Implementing a linguistic theory raises (at least implicitly) the question of the interpretation level. 



[ Evans87 distinguishes indirect, weak direct and strong direct interpretations: the former uses an 
intermediate mapping between the original theory and the grammars whereas the latter treats the 
grammars as directly characterising the language. Practically, the better the adequacy between 
linguistic and computational models, the higher the directness level]]. An indirect interpretation 
compiles the original formalism into another one (e.g. an HPSG grammar into a simple phrase 
structure one) in order to apply traditional parsing techniques. A strong direct interpretation 
implements parsing mechanisms as decribed in the theory. 

We argue in this paper that high-level languages can provide a good adequacy between the 
theory and its implementation. We approach in particular the question of constraints implementa- 
tion and show how constraint logic programming (and more particularly multi-paradigm languages 
such as LIFE) constitutes an efficient implementation framework. This paper describes how such 
a direct approach preserves the fundamental properties of the linguistic theory. 

2 Indirect vs. Direct Interpretations 

Most of the implementations rely on indirect or weak direct interpretations and generally compile 



the original formalism into Prolog clauses (see for example [ Carpenter93 1 , | Gotz95 ] or | PopowichQl | ) 
We can distinguish two different approaches according to the level of the implementation. One 
method consists of implementing the parser using a high-level language and relying on mechanisms 
as close as possible to the theory. In this case, the language is used both for the knowledge repre- 
sentation (i.e. coding the grammar) and the implementation of parsing mechanisms. The second 
approach proposes specific languages used for representing grammars and generates parsers using 
a low-level languag^ The figure (|l|) presents these approaches. 



We think that there is a deep difference between them from several point of view concerning 
in particular faithfulness, generality and control. Our argumentation relies on the observation of 
the parsers architecture and more particularly on the specification of the mechanisms required by 
HPSG. We distinguish the fundamental characteristics of a theory from the corresponding opera- 
tional concepts. As for HPSG, the basic notions are universal principles, structure sharing, sort 
hierarchy, well-typedness, etc. Their implementation requires specific mechanisms such as in par- 
ticular unification, constraint propagation, constraint satisfaction, inheritance, underspecification, 



-"^In the same perspective, [Fong91| completes directeness with the notion of faithfulness. 



•^In the ca sp nf AT.F,, th e parser is generated in Prolog but uses low-level instruction. In fact, as proposed by the 



authors (see |Carpcnter94 ), the host language should be C. 
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Figure 1: Systems Architecture 

delayed evaluation, etc. The adequacy between these two levels seems to be very important both 
for linguistics and computational reasons. 

2.1 Faithfulness 

Faithfulness, and more precisely the adequacy between the theoretical model and its implementa- 
tion, can be considered as a non formal criterion and evaluated from a formal and an operational 
point of view. From a formal point of view, a faithful approach applies directly the parsing al- 
gorithm to the original grammar whereas an indirect interpretation generates a grammar relying 
on another formalism. This difference is not purely esthetic: a direct approach implements and 
validates the model. 

As for the operational point of view, we think that faithfulness preserves in the implementation 
the properties of the theory itself. Let us precise this aspect for HPSG by focusing on two important 

characteristics: generality and integration. 

• Generality concerns the ability of representing universal phenomenons. This property is in 
fact closely related to the reusability which concerns both the linguistic level (reusing different 
grammars) and the computational one (reusing the same parser for different grammars). 

A faithful implementation preserves the generality level of the mechanisms (and their reusabil- 
ity) in the sense that if the theory describes a general high-level property (e.g. universal 
principles), then the corresponding mechanism is at the same level (e.g. active constraint on 
types). This entails a distinction between the architecture of the feature structures and the 
constraints that they must satisfy: on the one hand, a feature structure must be totally well- 
typed (architectural property) and correspond to an ID-schema, on the other hand, it must 
satisfy the universal principles. A faithful implementation using active constraints (in the 
constraint programming sense) allows such a distinction whereas a Prolog implementation 
cannot separate these levels: in this case, principle verification is evaluated after the instan- 
tiation of the concerned feature structure (for this reason, as in ALE, universal principles 
often belong the description of the ID-schematas). 

• The second property, integration, concerns the ability of representing in an homogeneous 
way different source of linguistic knowledge: prosody, phonology, morphology, syntax, se- 



mantics, etc. This property, as for generality, relies on the distinction between structures 
and constraints: integrated approaches must represent various kind of informations within 
a same structure. The relations between these informations are described using constraints. 
But there is another important aspect concerning the dynamicity of these relations. Indeed, 
an integrated approach must describe mutual dependencies between the different levels of 
information. These relations have consequences on the structure itself (in particular via 
structure sharing), but also on the processes constructing the corresponding structure. For 
example, syntactic informations can have consequences at the phoneme recognition level. 
This characteristic entails an on-line process and a direct manipulation of the original struc- 
tures. 

2.2 Control 

The control problem constitutes another divergent criterion between direct and indirect approaches. 
Several aspects can be underlined. 

The first point is theoretical and concerns the system architecture. The figure (|l|) indicates 
that the grammar developper in the case of an indirect system encodes the grammar into a specific 
formalism. The compiler then generates the parser itself. It is considered as a black box and the 
semantic is not accessible to the grammar developper who has no control on the parser itself. In 
the case of direct approaches, the developper knows the semantic of the language and has a direct 
control on the parser. 

The second aspect concerns more precisely the implementation level. The current state of 
parsing technologies shows that we need a clear distinction between the linguistic knowledge 
(including architectural aspects) and the implementation mechanisms. Practically, as shown in 
the figure (|l|), this separation is present in the parsers. The problem comes from the fact that 
in the case of an indirect approach, it is not always possible to apply such a distinction. This is 
the case for the application of the principles. Their possible presence in the ID-schematas is only 
justified by the fact that Prolog cannot represent directly constraints on types and must verify 
such properties on instantiated object. At the opposite, a language providing active constraints 
on types allows the declaration of such constraints a priori, in a global and persistent way. 

We can generalize this last remark to the adequacy between the mechanisms required by HPSG 
and those actually implemented. If the host language of the system doesn't provide adequate 
mechanisms, they are simulated (e.g. inheritance becomes an inference process). 

Finally, indirect approaches need an interpretation of both formalisms and mechanisms. We 
think that before the efficiency problem (which is the main argument for indirect approaches), 
the actual problem for the implementation concerns the preservation of the theory's generality (of 
great importance in particular concerning reusability and maintenance). 

3 A Constraint Logic Programming Solution 

As described in the previous section, coding a grammar concerns knowledge representation but 
also interpretation of the implicit mechanisms of the theory. The question is now: is it possible to 
directly code a HPSG grammar into a high-level language or must we use a specific language ? We 
describe in this section a solution proposed within the constraint logic programming paradigm. 

3.1 Active constraints 

HPSG genera lly considers constraints as descriptions that a well formed object must satisfy (see 
|Carpenter92| ). In this definition, the notion of constraint is very precise and restrictive in com- 
parison with the traditional sense in linguistics. But it has a direct interpretation within the 
constraint programming paradigm with active constraint. We present this notion in this section 
and compare it with traditional approaches. 
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The classical evaluation method in logic programming relies on generate-and-test: it generates 
variable values before verifying their properties. Obviously, a value can be controled by unification 
with the head of the predicate, but it is impossible to evaluate properties (i) before the unification 
itself and (ii) if the object represented by the variable is only partially known. 

Active constraints can implement some of these properties and reduce the search space by 
applying them a priori: substitutions are allowed only if the constraint system remains coherent. 
In a constraint logic programming paradigm, the constraint satisfaction mechanism replaces uni- 
fication: each resolution step verifies the satisfiability of the constraint system and simplifies it. 
Binding a variable adds new constraints to this system. In other words, the classical method in 
Prolog uses a single kind of constraints (unification) whereas a constraint-based approach allows 
the definition of complex ones with a global scope. 

Concerning HPSG, a direct interpretation consists of implementing principles with active con- 
straints. This approach allows a clear distinction between the basic parsing mechanisms and the 
control level. The parse level consists of determining the possible relations (basically the va- 
lency) whereas constraints verify the well formedness of the structure. Insofar as constraints are 
active, such a verification has two main characteristics: it is an on-line process and it doesn't 
need any extra resolution step. Indeed, a classical Prolog implementation explicitly verifies the 
well-formedness using a set of predicates whereas a constraint approach verifies the satisfiability 
of the constraint system and unifies two terms in a single resolution step. It is clear that the 
evaluation of the constraint system satisfiability has a cost, but lower than a classical method 
because (i) the system can be simplified (whereas a classical resolution requires the evaluation of 
all the properties) and (ii) the search space is reduced a priori (this improves the control level). 



3.2 Implementation in LIFE 

The basic mechanisms required for a direct HPSG interpretation are in particular unification, 
constraint satisfaction and inheritance. As for kn owledge rep resentation, the basic objects are the 
typed feature structures. The language LIFE (cf. [ Ait-Kaci91|) implements all these requirements. 
It is a multi-paradigm language (functional, logic, constraint, object oriented paradigms) which 
uses the ijj-terms as basic objects. LIFE offers built-in inheritance together with constraint solving 
mechanisms: these characteristics allow (i) to constrain the terms and (ii) to control propagation. 

The following points are just a sketch illustrating the relevance of this framework for a direct 
interpretation. 



3.2.1 Principles 

In this language, constraints are expressed on the sorts: descriptions corresponding to principles 
are implemented directly in this way. We can remark that the formulation of these principles 
are very similar in all the system implementing HPSG. The difference here doesn't concern the 
representation but the evaluation. We take here the case of two basic principles implemented as 
active constraints on the type phrase. Each term of this type must satisfy these constraints even 
if it is not instantiated. 



• HFP: 

: : P: phrase | (P . synsem. loc . cat .head = X, 

P .dtrs .head-dtr . loc . cat .head = X). 

Such a constraint stipulates that every term of sort phrase must have the concerned head 
values refering to the same term (tagged by X). 

• Valency Principle: 
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P: syntagme | ( P . synsem . loc . cat . sub j = X, 
P . dtr . sub j -dtr = Y, 

P .dtr .head-dtr . synsem. loc . cat . sub j = append(X,Y), 
P . synsem. loc . cat . comps = U, 
P .dtr . comp-dtrs = V, 

P .dtr .head-dtr . synsem. loc . cat . comps = append(U,V)) . 

This principle has also a classical formulation, very close to that proposed in other approaches 
which is not surprising. What is interesting here is the use of the function append which 
residuates if its arguments are unsufficiently known. This constraint can therefore be applied 
a priori and the instantiation of one of the concerned features fires the evaluation of the 
function and install the constraint. 

In a classical Prolog implementation, these principles are verified after the construction of each 
phrasal sign. In LIFE, these constraints are (automatically) satisfied at each moment by these 
terms. The satisfiability is not evaluated after the complete instantiation of the term but checked 
at each step since its creation: incoherences are detected sooner than for classical generate-and-test 
approaches. 



3.2.2 Inheritance 

Inheritance relations allows the specification (and the propagation) of several properties. A well- 
formed sign must satisfy principles together with these properties. 

Inheritance in LIFE can be seen as a constraint in the sense that it is integrated to the unifi- 



cation algorithm. This improves a classical Prolog approach because, as shown in |Ait-Kaci86 
inheritance is processed by unification steps instead of resolution ones. 

Practically, we implement directly the sort hierarchy using sort inheritance definitions as de 
scribed in the theory. 

lex <| sign. phrase <| sign, 
noun <| subst . subst <| head. 



Each sort being possibly constrained with a particular property, this mechanism can implement, 
as in the theory, complex description relying on multiple inheritance. 



3.2.3 Sort Resolution and Appropriateness 

HPSG defines both sort hierarchies and features appropriated to each sort. LIFE allows to closely 
follow HPSG's definitions by {i) defining sort inheritance hierarchy and {ii) constraining features 
associated to them. For example, we declare the substantive sort which subsorts are noun, verb, 
adjective, preposition and relativizer. Then we pose the constraint about the sort category, which 
HEAD feature must be of sort lower than substantive. 

substantive := noun; verb; ad j ; prep; reltvzr . 

:: C : category (head => substantive) I C.head :< substantive. 

However, LIFE permits to dynamically enlarge feature structures unlike HPSG where fea- 
ture structures are canonical. To constrain LIFE to have at most the feature appropriated to a 
structure, we pose some additionnal constraints. 

C:category I Imember (features (C) , [head, valence, marking] ) . 

Then, the sort category is constrained to have at most the head, valence and marking 
features as defined in the theory. The Imember predicate parses the authorized features list and 
succeed if all features of C belong to it. 
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4 Conclusion 



Our approach shows that a strong direct interpretation can be efficient in several respects. First, 

the implementation framework is an actual programming language which avoid the development of 
translation tools. Second, a direct interpretation allows a good maintenance and reusability of the 
systems in particular because the generality of the theoretical framework is preserved. Finally the 
constraint programming paradigm offers very efficient properties useful particularly for knowledge 
representation and control. To summarize, the implementation of linguistic constraint using active 
constraint is concise, faithful and efficient. 



References 

[Ait-Kaci86] Ait-Kaci H. & R. Nasr (1986), "Login: A Logic Programming Language with Built-in Inher- 
itance", in Journal of Logic Programing, 1986:3. 

[Ait-Kaci91] Ait-Kaci H. & A. Podolski (1991), Towards a Meaning of LIFE, in Proceedings of the 3rd Inter- 
national Symposium on Programming Language Implementation and Logic Programming, Springer- 
Verlag. 

[Blache95] Blache P. & N. Hathout (1995) "Constraint Logic Programming for NLP", in proceedings of 
the 5th International Workshop on Natural Language Understanding and Logic Programming. 

[Carpenter92] Carpenter B. (1992) The Logic of Typed Feature Structures, Cambridge University Press. 

[Carpenter93] Carpenter B. (1993), ALE - The Attribute Logic Engine. User's Guide, CMU-LCL Report. 

[Carpcntor94] Carpenter B. & G. Penn (1994) "Compiling Typed Attribute- Value Logic Grammars", 

Technical Report, Carnegie Mellon University. 

[Evans87] Evans R. (1987), Theoretical and Computational Interpretations of GPSG, PhD Thesis, Uni- 
versity of Sussex. 

[Fong91] Fong S. (1991), Computational Properties of Principle-Based Grammatical Theories, PhD Thesis, 
MIT. 

[G6tz95] Gotz T. & D. Meurers (1995), "CompiUng HPSG Type Constraints into Definite Clause Pro- 
grams", in Proceedings of ACL '95. 

[Minnen95] Minnen G., D. Gerdemann & T. Gotz (1995) "Off-line Optimization for Earley-style HPSG 
Processing", in proceedings of E ACL '95. 

[Pollard & Sag94] Pollard C. & I. Sag (1994), Head-driven Phrase Structure Grammars, CSLI Lecture 
Notes, Chicago University Press. 

[Popowich91] Popowich F. & C. Vogel (1991) "Logic-Based Implementation of HPSG", in Natural Lan- 
guage Understanding and Logic Programming III, C. Brown & K. Koch eds.. North Holland. 



6 



